19 research outputs found
Learning Blind Motion Deblurring
As handheld video cameras are now commonplace and available in every
smartphone, images and videos can be recorded almost everywhere at anytime.
However, taking a quick shot frequently yields a blurry result due to unwanted
camera shake during recording or moving objects in the scene. Removing these
artifacts from the blurry recordings is a highly ill-posed problem as neither
the sharp image nor the motion blur kernel is known. Propagating information
between multiple consecutive blurry observations can help restore the desired
sharp image or video. Solutions for blind deconvolution based on neural
networks rely on a massive amount of ground-truth data which is hard to
acquire. In this work, we propose an efficient approach to produce a
significant amount of realistic training data and introduce a novel recurrent
network architecture to deblur frames taking temporal information into account,
which can efficiently handle arbitrary spatial and temporal input sizes. We
demonstrate the versatility of our approach in a comprehensive comparison on a
number of challening real-world examples.Comment: International Conference on Computer Vision (ICCV) (2017
Efficient Large-scale Approximate Nearest Neighbor Search on the GPU
We present a new approach for efficient approximate nearest neighbor (ANN)
search in high dimensional spaces, extending the idea of Product Quantization.
We propose a two-level product and vector quantization tree that reduces the
number of vector comparisons required during tree traversal. Our approach also
includes a novel highly parallelizable re-ranking method for candidate vectors
by efficiently reusing already computed intermediate values. Due to its small
memory footprint during traversal, the method lends itself to an efficient,
parallel GPU implementation. This Product Quantization Tree (PQT) approach
significantly outperforms recent state of the art methods for high dimensional
nearest neighbor queries on standard reference datasets. Ours is the first work
that demonstrates GPU performance superior to CPU performance on high
dimensional, large scale ANN problems in time-critical real-world applications,
like loop-closing in videos
GGNN: Graph-based GPU Nearest Neighbor Search
Approximate nearest neighbor (ANN) search in high dimensions is an integral
part of several computer vision systems and gains importance in deep learning
with explicit memory representations. Since PQT and FAISS started to leverage
the massive parallelism offered by GPUs, GPU-based implementations are a
crucial resource for today's state-of-the-art ANN methods. While most of these
methods allow for faster queries, less emphasis is devoted to accelerate the
construction of the underlying index structures. In this paper, we propose a
novel search structure based on nearest neighbor graphs and information
propagation on graphs. Our method is designed to take advantage of GPU
architectures to accelerate the hierarchical building of the index structure
and for performing the query. Empirical evaluation shows that GGNN
significantly surpasses the state-of-the-art GPU- and CPU-based systems in
terms of build-time, accuracy and search speed
Motion Deblurring in the Wild
The task of image deblurring is a very ill-posed problem as both the image
and the blur are unknown. Moreover, when pictures are taken in the wild, this
task becomes even more challenging due to the blur varying spatially and the
occlusions between the object. Due to the complexity of the general image model
we propose a novel convolutional network architecture which directly generates
the sharp image.This network is built in three stages, and exploits the
benefits of pyramid schemes often used in blind deconvolution. One of the main
difficulties in training such a network is to design a suitable dataset. While
useful data can be obtained by synthetically blurring a collection of images,
more realistic data must be collected in the wild. To obtain such data we use a
high frame rate video camera and keep one frame as the sharp image and frame
average as the corresponding blurred image. We show that this realistic dataset
is key in achieving state-of-the-art performance and dealing with occlusions
Learning Robust Video Synchronization without Annotations
Aligning video sequences is a fundamental yet still unsolved component for a
broad range of applications in computer graphics and vision. Most classical
image processing methods cannot be directly applied to related video problems
due to the high amount of underlying data and their limit to small changes in
appearance. We present a scalable and robust method for computing a non-linear
temporal video alignment. The approach autonomously manages its training data
for learning a meaningful representation in an iterative procedure each time
increasing its own knowledge. It leverages on the nature of the videos
themselves to remove the need for manually created labels. While previous
alignment methods similarly consider weather conditions, season and
illumination, our approach is able to align videos from data recorded months
apart.Comment: International Conference On Machine Learning And Applications (ICMLA
2017
Flex-Convolution: Million-Scale Point-Cloud Learning Beyond Grid-Worlds
Traditional convolution layers are specifically designed to exploit the
natural data representation of images -- a fixed and regular grid. However,
unstructured data like 3D point clouds containing irregular neighborhoods
constantly breaks the grid-based data assumption. Therefore applying
best-practices and design choices from 2D-image learning methods towards
processing point clouds are not readily possible. In this work, we introduce a
natural generalization flex-convolution of the conventional convolution layer
along with an efficient GPU implementation. We demonstrate competitive
performance on rather small benchmark sets using fewer parameters and lower
memory consumption and obtain significant improvements on a million-scale
real-world dataset. Ours is the first which allows to efficiently process 7
million points concurrently.Comment: accepted at ACCV 201
Will People Like Your Image? Learning the Aesthetic Space
Rating how aesthetically pleasing an image appears is a highly complex matter
and depends on a large number of different visual factors. Previous work has
tackled the aesthetic rating problem by ranking on a 1-dimensional rating
scale, e.g., incorporating handcrafted attributes. In this paper, we propose a
rather general approach to automatically map aesthetic pleasingness with all
its complexity into an "aesthetic space" to allow for a highly fine-grained
resolution. In detail, making use of deep learning, our method directly learns
an encoding of a given image into this high-dimensional feature space
resembling visual aesthetics. Additionally to the mentioned visual factors,
differences in personal judgments have a large impact on the likeableness of a
photograph. Nowadays, online platforms allow users to "like" or favor certain
content with a single click. To incorporate a huge diversity of people, we make
use of such multi-user agreements and assemble a large data set of 380K images
(AROD) with associated meta information and derive a score to rate how visually
pleasing a given photo is. We validate our derived model of aesthetics in a
user study. Further, without any extra data labeling or handcrafted features,
we achieve state-of-the art accuracy on the AVA benchmark data set. Finally, as
our approach is able to predict the aesthetic quality of any arbitrary image or
video, we demonstrate our results on applications for resorting photo
collections, capturing the best shot on mobile devices and aesthetic key-frame
extraction from videos